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1. INTRODUCTION 

The need for cache memory arises as a result of the speed of processor being higher than that of the 
main memory. It is worthy to note that the memory wall problem [1], [2] in general purpose computer also 
affects embedded systems. The problem is that at a point an increase in processor speed does not contribute 
much to performance of computer systems [3]. This is because the net increase in performance of a computer 
system do not depend on processor alone but also dependent on other factors such as memory speed, input/ 
output device and the bus configuration. Moreover, increasing processor speed means increasing the clock 
rate which has an adverse effect in term of power dissipation [4]. The major setback in computing system 
performance is the improvement in the speed of memory devices such as DRAM [5] which is very slow 
compared to processor. According to Wilkes [3], for two decades now, there have not being any major 
improvement recorded in the area of memory speed. It means that processor has to waste most of it clock 
cycle waiting to be serviced by the main memory. In effect, this memory speed limits the performance of 
computer systems drastically. This has led to researches on how to improve the memory speed in order 
increase the speed of computer systems. The formal way of closing this speed gap is by using cache memory 
between the main memory and processor. In this quest, designers started using cache memory which is 
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smaller by faster memory device made from static random access memory (SRAM) technology to copies 
locations of memory in other to service the processor faster on request [6]. A typical embedded systems level 
one £7, data cache memory is found in Arm Cortex A7 [7]. It has only two levels of cache memory but with 
split data and instruction cache. The level one L, data cache organization is set associative with least recently 
used replacement policy. When a processor requests a memory location, that location is first search for in the 
cache and 1f it is not in cache, the location will be fetched from main memory. This main memory location is 
first copied into the cache before loading to the processor. Next time when this same memory location is 
requested, it will be serviced from the cache thereby hiding the real speed or reducing the latency of the main 
memory. This process helps to improve the speed of memory systems. 

The idea behind the success of cache memory is locality of reference in a program [8]-[11]. Program 
execution usually favours portion of the main memory within a short duration of time. According to Eklov et 
al. [12], the factors that contribute to this locality patterns are sequential execution of instructions, loops in a 
program, and nature of data items stored. First, principle of locality states that program tends to reuses 
memory location in which it had accessed and that is called temporary locality [13], [14]. Secondly, it states 
that it is also likely that location of the memory close to the recently referenced memory location may be 
referenced in near future and it is called spatial locality [15]. These two properties of locality are what drive 
the effectiveness of cache memory. The block of main memory which is referenced is copied into the cache 
line following the mapping procedures. At the time that the cache is full any memory location requested by 
the processor which is not in cache means that an already existing memory block in the cache have to be 
evicted so that the recently referenced block will be copied into the cache. The procedure in which a block of 
main memory already in the cache is selected for eviction from the cache and to be replaced by the most 
recent request is called cache replacement policy [16]. The major replacement policy that are mostly used are 
least recently used (LRU), random and first-in first-out (FIFO) replacement policies. The choice of 
replacement policies used by computer architecture is dependent on the performance optimization 
requirement and cost. Another important aspect of cache memory design is its degree of associativity with 
the main memory. The procedure in which the block of main memory is mapped into the line of the cache is 
called cache associative [17]. Some of the well-known standard caches mapping techniques are set 
associative, direct mapping and fully associative mapping [18]. Cache associative is an important aspect of 
cache performance criteria. So, it is worthy to note that cache hit rate and miss rate of cache is dependent on 
the type of associative of the cache. 

In order to calculate or predict the locality and performance of cache memory analytically through 
any performance parameter like cache hit rate, latency and effectiveness, a metric that is microarchitectural 
independent and also representative of the workload is required. Reuse distance and reuse time [19], [20] are 
two most popular metrics that is used in prediction of cache performance. Reuse distance is the number of 
unique intervening memory accesses between the use and the reuse of a particular memory location while 
reuse time is the number of total or absolute memory accesses between the use and reuse of such memory 
location. During program execution, processor makes series of reference to various memory locations that is 
required for successful execution of the program. The flow of this memory location reference is called 
memory reference stream. As the processor makes this reference, a collection of the memory location 
referenced is called memory profile [6] which is very important input parameter in cache analysis. Memory 
profile is useful as stack distance or reuse distance of a memory stream is obtained from it. The locality of 
cache is determined through the analysis of program memory reference stream. If a memory reference stream 
exhibits a significant temporary locality (e.g., once accessed, references to the same address location is likely 
in near future) or spatial locality (e.g., once accessed, references to the neighbouring address locations is 
likely in near future), the cache hit rate hence overall performance will be high [6]. According to Zhong [21], 
the smaller the reuse distance of memory location of an application, the more the application obeys the 
principle of locality. This give rise to overall increases in cache performance but if the reuse distances are 
large there is high probability that the application will yield low hit rate in cache thereby reducing the cache 
performance. In this study, the metric used was reuse distance because it describes well the locality of 
memory accesses and is closely related to the behavior of LRU policy which allowed us to assess how well 
the cache is utilized. Furthermore, reuse distance is machine-independent and deterministic, making it an 
ideal metric for used in performance modeling. 

The earlier background for analytical cache model is from the work of Mattson et al. [8], when they 
developed a stack counting algorithms that processed accesses of an applications to memory locations and 
their reuse as distance of the first access of that memory location in the stack to the its present access called 
stack distance or reuse distance. Their interest was not to calculate or model cache behavior but to analyze 
the locality of an application, as a result they did not come up with any cache model but their result is 
background to this present study. According to Enbody and Brehob [22], they proposed an analytical cache 
framework based on stack distance distribution to describe the behavior of cache. They developed model to 
describe the locality of a reference stream and another one used to quantify cache locality and behaviour. It is 
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important to the present study because the cache model proposed in this study is an extension of their second 
model. In the work of Pan and Jonsson [20], they proposed an analytical cache framework based on absolute 
reuse time distribution to describe the behavior of cache. 

The study used Pin [23] to trace the workload which are SPEC CPU2006 [24] benchmarks to get the 
reuse time which is feed into their model for evaluation. Their study pointed out that profiling stack distance 
of an application incurs much overhead so they resorted to low overhead metrics of absolute reuse distance or 
reuse time. Beckman and Sanchez [25] also proposed a new probabilistic cache model based on absolute 
reuse distance for high performance replacement policies. The model used reuse times which were collected 
using hardware monitor and models replacement policies as abstract ranking functions. Their model is an 
age-based model which comprises of age, eviction and hit distribution models. Another study by Chen et al. 
[10] which focused on the optimal multi-level cache design used reuse distance metric to model cache 
performance. In their work, they predicted the miss rate of multi-level fully associative cache with LRU 
replacement policy using reuse distance metric. They assumed that fully associative LRU cache can be used 
as representative of set associative cache as there is no significant difference in their performance giving the 
same cache size. Just like the present study, they used Pintool to collect metrics and also compared their 
results with results from Simplescalar simulation; their average error is 0.71% (L2) and 1.1% (L3). Gysi et al. 
[26] proposed an analytical cache model algorithm that predict the miss and hit rate of fully associative with LRU 
cache. They called their algorithm Haystack, and it make used of calculated program reuse distance to predict 
performance just like the present study. The study used Polybench in evaluation of their model and compared 
results with results from Uniprocessor simulator. They also validated their results using measures results 
from PAPI [27]. The evaluation errors in their model were within 0.6% to the measured results. 

This study will serve as useful tool for cache performance analysis as it will help computer 
architects, researchers and students to have insight into cache behavior under different configurations. It will 
also help them in design decision for system optimization and management. Equipped with proper 
understanding of program locality and how to extract reuse distance from memory profile of a program, 
prediction of cache performance will now become easier. The aim of this present study is to presents a model 
of cache performance for embedded systems and the specifics objectives include:To develop a mathematical 
cache hit rate estimation model; To characterize the locality of embedded system workloads; To evaluate the 
model using the metrics from characterized embedded system workloads; and To compare the behaviour of 
the model with that of standard cache simulator by applying the same characterized embedded workloads and 
cache configurations parameters. 


2. PROPOSED CACHE HIT RATE ESTIMATION MODEL 

To model cache behaviour, the aim is to use reuse distance obtained using MICA pintool to generate 
our model. This means that given a memory trace, T, for every reuse distance, ‘d’ one would want to know 
the probability of cache hit. It is obvious that in set associative cache with LRU replacement policy that the 
reuses of memory block for a distance that is less than the associativity of the cache is a hit. This is because 
for any memory block to be replaced in cache there must be up to ‘A’ number of distinct access to that set 
that contain memory block where ‘A’ is the degree of associative of the cache. Therefore, the hit ratio of all 
the accesses with reuse distance d which is less than or equal to ‘4-1’ is the cumulative fraction of accesses 
with reuse distance up to ‘d’ in the total of accesses. 

Now for reuse distance, d, which is equal to ‘A’ and above, the problem is how to determine the hit 
rate given that the reuse distance obtained is not set reuse distance but for the entire cache memory. Due to 
the fact that we don’t have access to set reuse distance, it is right to estimate the probability that for a given 
cache set X, which contain memory block EF, that the reuse of memory block E, is a hit. To be able to 
estimate that, certain assumptions have to be made to make this study tractable. The following assumptions 
were made: The study assumed that memory accesses are mapped into various caches line randomly; 
Mapping of cache lines into various caches sets are assume to be random; and Reuse distance distributions 
are identically and independently distributed (iid). There is need to define certain concept that will help in 
making the derivation of this cache model less cumbersome. If during the entire run of a program that 
processor reuses some memory blocks with reuse distance d, the fraction of memory accesses with reuse 
distance d, will be the fraction memory accesses with d, to the total memory accesses. Now, reuse distance 
distribution Ra, is defined as the fraction of memory accesses with exactly reuse distance d, is given by (1). 


_ Frequency of reuse distance,fd 


Ra (1) 


Total memory accesses,T 
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where fa-frequency of reuse distance equal to exactly d, and T=total no. of memory accesses. Cumulative 
Reuse distance R (isa is also defined as the probability of obtaining at most reuse distance d, in the entire 
memory accesses, as in (2): 


Riisa = Lee Ra (2) 


In full associative cache, (2) gives the hit rate of the cache when d is equal to B-J where B is the 
number of the block in a cache. In set associative cache, (2) is used to determine the hit rate at reuse distance 
d, that is less or equal to A-7 where A is the associativity of the cache. In this study, the aim is to model set 
associative Lı cache of which some of the reuse distance will be equal or more than the associativity of the 
cache. 

Let’s start by finding the probability of cache access being in cache set X, the probability of an 
access being in set X is X1=//S and the probability of not being in set X is Xo=(1-//S). So, assuming that X7 is 
probability of success p and probability of failures Xo is g=(1- //S). This can be stated in form of binomial 
probability by finding the probability of obtaining a success after d (reuse distance) using Bernoulli trials as 
shown in (3): 


d 
P (Success) = 2. Goa (3) 
=0 


where k=1, 2, 3...d, (k is number of success) 

In memory block, reuses of cache line which the reuse distance is more than the associativity of the 
cache, the reuse can be cache hit only and only if the number of intervening accesses in cache set X is at most 
equal to A-Z. To calculate the probability of hit in reuse distance that is greater than or equal to the 
associativity A, the probability of obtaining at most A-J unique intervening accesses in cache set X is 
obtained. This can be derived in form of cumulative binomial probability distribution function as shown in (4). 


A-1 
d = 
Ca > Cpa (4) 
=0 


Substituting for p and q in (4) gives (5): 


A=1 
Pksa-1) = » ($) (2): = (5) 


k=0 


In (5) calculate the probability of hit in accesses with reuse distance equal or greater than A. To 
calculate the hit rate in such reuse distance, we multiply the (5) with (1) the fraction of accesses that has that 
reuse distance which gives rise to (6) and (7). 


A-1 
p _ No.of accesses with reuse distance=d 4 (2) Aj eo (6) 
(di) Total no.of memory accesses k7 \S S 
k=0 


A-1 
rarater OGE a 


k=0 


In (6) and (7) gives the probability of hit in a given reuse distance equal or greater than degree of 
associative in cache. Then the total hit rate for all the reuse distance equal to A and above is given in (8). 


A-1 
P(dzArota) = 4A Rag * » Gd o jy (8) 


k=0 


where Rg is reuse distance distribution of memory accesses with reuse distance=d. 

In (8) gives cache hit rate for a memory block reuse with reuse distance greater or equal to 
associativity of a cache A. In other to obtain the overall hit rate of an application giving an LRU cache with 
associativity A and number of set S, addition of all the hit rate of every memory reuse distance distribution 


Modeling cache performance for embedded systems (V. C. Chijindu) 


29140 ISSN: 2302-9285 


including those with reuse less than or equal to A-J is required. Therefore, cache hit rate of an application is 
given as shown in (9) which is derived by obtaining the cumulative reuse distance distributions up to A-J and 
adding it to (8). 


A-1 
Hit Rate = Y4-1Ry + VS Ry * » @ aj (5 (9) 


k=0 


In (9) gives us the cache hit rate of an application in set associative cache with least recently used 
(LRU) replacement policy. 


3. RESEARCH DESIGN 
This section presents the data, tools, benchmarks and experimental setup for this study. We also 
explore the methods employed in the implementation of this study. 


3.1. Metrics 

In this experiment, we choose a microarchitecture-independent metric which characterized the 
memory usage of embedded system workloads. The metric been selected is reuse distance which was profiled 
using MICA Pintool [28]. This metric is microarchitecture independent because it only characterizes the 
memory behavior of workloads given a particular instruction set architecture (ISA). It is flexible to use and 
remain the same across different microarchitecture within the same ISA. Also, cache parameters such as 
cache sizes and configurations were also used along with this reuse distance to predict cache performance 
which in this case was cache hit rate. 


3.2. Benchmarks 

We used Mibench benchmarks [29] in the evaluation of the cache model presented. Mibench 
benchmarks are benchmark suite which follows EEMBC benchmarks [30] model. It is divided into six 
groups to represents the six domains of embedded systems. Three benchmarks were chosen from Mibench 
benchmark, two from automotive/industrial domain which are bitcounts and basicmath and FFT benchmark 
from network domain. These benchmarks were used to evaluate the model presented in this paper. 


3.3. Tools 

In order to profile the memory reuse distance, Intel Pin [23] a dynamic instrumentation engine was 
used along with Pintool called MICA developed by Eeckhout and Hoste [28]. This pin tool is capable of 
characterizing the memory reuse distance of embedded systems workload. The results of this tool are 
generated in bin with each bin representing the frequency of a particular range of reuse distance. MICA 
Pintool can be configured to profile a particular number of instructions or full instructions using its 
configuration file. The cache model presented is built using Bournoli cumulative binomial probability as 
shown in section 2. After evaluation of the model using the profiled reuse distance metrics and cache 
configuration parameters, we use sim-cheetah [31] from Simplescalar simulator suites [32] to simulate these 
selected benchmarks given the corresponding cache sizes and configurations. 


3.4. Implementation 

It is also important to note that all results from this study were generated using Intel® core (TM) 13 
processor, Ubuntu 10.10 and gcc-3.4. Intel Pin-3.4 was used along with MICA_v0.40 Pintool to characterize 
the applications to generate reuse distance profiles. Sim-cheetah simulator from Simplescalar simulator suite 
v4.0 was used to compare the results of the cache model. We implemented the cache model using the 
following cache configurations; 2-ways, 4-ways, 8-ways and 16-ways and the cache sizes examined was 4kb, 
Skb, 16kb, 32kb, and 64kb. The size of cache line used in both evaluation and sim-cheetah simulation was 
32kb. These cache parameters were chosen to represent real world level one (L1) data cache in an embedded 
processor. Three benchmarks were chosen from Mibench benchmark, two from automotive/industrial domain 
which are bitcounts and basicmath and FFT benchmark from network domain. These benchmarks were used 
to evaluate the model presented in this paper. For the model evaluation, these benchmarks were compiled 
using gcc-3.4 with O3 optimization. First, the selected applications after compilation were characterized into 
their memory reuse distance using MICA Pintool. Then we built a reuse distance histogram for the three 
benchmarks selected to show how characterized were the benchmarks, given their memory reuse distance. In 
order to evaluate the model, these memories reused distance (d) profiled along with corresponding cache 
parameters were substituted in our cache model to predict or to estimate the cache hit rate for given 
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application. For each benchmark, these steps were taking for the whole cache sizes and configurations, one at 
a time to arrive at the results presented. Finally, sim-cheetah was used to simulate these selected benchmarks 
by configuring it for the same cache sizes and configurations as in the cache model evaluation. The results of 
sim-cheetah give the cache miss rate which was converted to hit rate using the relationship between the two 
parameters. Both results from cache model and sim-cheetah were presented alongside each other in a table 
for comparisons. We also calculated the absolute mean errors between the cache model results and that from 
sim-cheetah simulator. These calculated errors were presented alongside the tables of results. 


4. RESULTS AND DISCUSSION 

In this section, we evaluate the model given in (9). But before that, we characterized the memory 
reuse distance of the three selected Mibench benchmarks and built their reuse distance histogram as shown in 
Figures 1-3. Figure 1 shows the reuse distance histogram of bitcount benchmark with more than 99.9 percent 
of the reuse distance having a reuse distance below 24 while the largest reuse distance for the benchmarks is 
384. In Figure 2, which is the reuse distance histogram of Basicmath benchmark, the highest memory reuse 
distance obtained is 196608 as compared to 384 from bitcount. Also, more than 99.9 percent of the memory 
access in basicmath benchmark has its reuse distance at maximum of 192. Figure 3 shows the reuse distance 
histogram of FFT benchmark. It shows that its largest reuse distance is 49152 with frequency of 76562. Over 
99.8% of the reuse distance has reuse distance below 384 meaning that the benchmark locality is good. 
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Figure 3. Reuse distance histogram of FFT benchmark 


After characterizing the benchmarks, the reuse distance profiles were used to evaluate our cache 
model. Also, the benchmarks were simulated using sim-cheetah simulator under the same cache parameters 
as in cache model evaluation for comparison of results. Both results were shown alongside each other in a 
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table with the calculated absolute error between them. Tables 1, 2 and 3 shows the evaluated results obtained 
by using the cache model to estimate hit rate and that from sim-cheetah simulator for bitcount, basicmath and 
FFT benchmarks respectively for comparisons. 


Table 1. Results on FFT benchmark for different cache configurations and sizes with errors 


Cache Configurations Percentage Hit Rates Using Proposed Percentage Hit Rates Using Absolute 


Model Sim-cheetah Errors 
4kb 16-Ways 99.9986 99.9998 0.0012 
4kb 8-Ways 99.9984 99.9998 0.0014 
4kb 4-Ways 99.9891 99.9998 0.0107 
4kb 2-Ways 99.6254 99.9998 0.3730 
8kb 16-Ways 99.9002 99.9998 0.0996 
8kb 8-Ways 99.9982 99.9998 0.0016 
8kb 4-Ways 99.9993 99.9998 0.0005 
8kb 2-Ways 99.9994 99.9998 0.0004. 
16kb 16-Ways 99.9995 99.9998 0.0003 
16kb 8-Ways 99.9995 99.9998 0.0003 
16kb 4-Ways 99.9994 99.9998 0.0004. 
16kb 2-Ways 99.9739 99.9998 0.0259 
32kb 16-Ways 99.9930 99.9998 0.0068 
32kb 8-Ways 99.9995 99.9998 0.0003 
32kb 4-Ways 99.9995 99.9998 0.0003 
32kb 2-Ways 99.9995 99.9998 0.0003 
64kb 16-Ways 99.9995 99.9998 0.0003 
64kb 8-Ways 99.9995 99.9998 0.0003 
64kb 4-Ways 99.9995 99.9998 0.0003 
64kb 2-Ways 99.9979 99.9998 0.0019 
Mean Values 99.9734 99.9998 0.02629 


Table 2. Results on basicmath benchmark for different cache configurations and sizes with errors 


Cache Conficuntions Percentage Hit Rates Using Percentage Hit Rates Using Absolute 
Proposed Model Sim-cheetah Errors 
4kb 16-Ways 92.3929 99.8441 7.4512 
4kb 8-Ways 91.7408 99.8357 8.0949 
4kb 4-Ways 90.9682 99.8382 8.8700 
4kb 2-Ways 89.3096 99.7936 10.4840 
8kb 16-Ways 98.9793 99.9788 0.9995 
8kb 8-Ways 98.2066 99.9751 1.7685 
8kb 4-Ways 97.0596 99.9687 2.9091 
8kb 2-Ways 95.1039 99.9263 4.8224 
16kb 16-Ways 99.9504 99.9999 0.0495 
16kb 8-Ways 99.8778 99.9999 0.1221 
16kb 4-Ways 99.4657 99.9995 0.5338 
16kb 2-Ways 98.2240 99.9907 1.7667 
32kb 16-Ways 99.9546 99.9999 0.0453 
32kb 8-Ways 99.9635 99.9999 0.0364 
32kb 4-Ways 99.9023 99.9999 0.0976 
32kb 2-Ways 99.4299 99.9999 0.5700 
64kb 16-Ways 99.9546 99.9999 0.0453 
64kb 8-Ways 99.9546 99.9999 0.0453 
64kb 4-Ways 99.9503 99.9999 0.0496 
64kb 2-Ways 99.8096 99.9999 0.1903 
Mean Values 97.5099 99.9575 2.4476 


Figures 4-6 also shows the graphs of cache model predicted and sim-cheetah simulated hit rate 
against different cache configurations for bitcount, basicmath and FFT benchmark. The summary of the 
results shows that the model mean errors for the cache model for bitcount, basicmath, and FFT benchmarks 
are 0.0263%, 2.4476%, and 1.9000% respectively. Hence, the model mean error for the three benchmarks is 
equal to 1.4579%. These results are comparable to the results obtained in [20] and [22] which gave model 
mean errors for LRU as 1.9% and 2.17% respectively. 
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Table 3. Results on FFT benchmark for different cache configurations and sizes with errors 


Cache Confipirations Percentage Hit Rates Using Percentage Hit Rates Using Absolute 
Proposed Model Sim-cheetah errors 
4kb 16-Ways 93.4509 99.8111 6.3602 
4kb 8-Ways 93.4834 99.8119 6.3285 
4kb 4-Ways 93.0136 99.6824 6.6692 
4kb 2-Ways 91.0497 98.3705 7.3208 
8kb 16-Ways 98.8278 99.8277 0.9999 
8kb 8-Ways 98.1577 99.8302 1.6725 
8kb 4-Ways 97.3649 99.8054 2.4405 
8kb 2-Ways 95.8140 98.9278 3.1138 
16kb 16-Ways 99.7724 99.8307 0.0583 
16kb 8-Ways 99.7016 99.8307 0.1291 
16kb 4-Ways 99.3466 99.8236 0.4770 
16kb 2-Ways 98.3499 99.8167 1.4668 
32kb 16-Ways 99.7802 99.8308 0.0506 
32kb 8-Ways 99.7788 99.8308 0.0520 
32kb 4-Ways 99.7321 99.8308 0.0987 
32kb 2-Ways 99.3428 99.8297 0.4869 
64kb 16-Ways 99.7802 99.8308 0.0506 
64kb 8-Ways 99.7802 99.8309 0.0507 
64kb 4-Ways 99.7761 99.8310 0.0549 
64kb 2-Ways 99.6587 99.8310 0.1723 
Mean Values 97.7981 99.7007 1.90000 
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Figure 4. Comparison of model and sim-cheetah hit rate for bitcount benchmarks 
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Figure 5. Comparison of model and sim-cheetah hit rate for basicmath benchmarks 
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Figure 6. Comparison of model and sim-cheetah hit rate for FFT benchmarks 


5. CONCLUSION 

The embedded workloads were characterized to collect reuse distance metrics using MICA Pintool. 
This metrics with corresponding cache configuration parameters were applied to the developed model to 
estimate cache hit rate as shown in the results from this study. The results were compared using sim-cheetah 
from Simplescalar simulators suite. The margin of errors in results was below 5% and within the acceptable 
limits showing that the model can be used to estimate hit rates of cache and to explore cache design options. 
This model proved to be feasible since the results were comparable to other existing cache models. The 
results shows that the smaller the reuse distance, the better the performance of the cache. The results of this 
model were compared to simulated hit rate from standard simulator called sim-cheetah and it was observed to 
follows similar trend with an allowable margin of error. In conclusion, this model as presented can be used to 
estimate cache memory behavior with reasonable accuracy. 
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